NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Large Language Models for Test-Free Fault Localization

https://doi.org/10.1145/3597503.3623342

Yang, Aidan_Z H; Le_Goues, Claire; Martins, Ruben; Hellendoorn, Vincent (February 2024, ACM)

Full Text Available
Large Language Models for Test-Free Fault Localization

Yang, Aidan Z.H.; Le Goues, Claire; Martins, Ruben; Hellendoorn, Vincent J. (January 2024, Proceedings International Conference Software Engineering Education Practice)

Full Text Available
CAT-LM Training Language Models on Aligned Code And Tests

https://doi.org/10.1109/ASE56229.2023.00193

Rao, Nikitha; Jain, Kush; Alon, Uri; Le_Goues, Claire; Hellendoorn, Vincent J (September 2023, IEEE)

Full Text Available
Capturing Structural Locality in Non-parametric Language Models

Xu, Frank F.; He, Junxian; Neubig, Graham; Hellendoorn, Vincent Josua (April 2022, International Conference on Learning Representations)

Full Text Available
Patch Generation with Language Models: Feasibility and Scaling Behavior

Kolak, Sophia D.; Martins, Ruben; Le Goues, Claire; Hellendoorn, Vincent Josua (April 2022, Deep Learning for Code Workshop)

Large language models have shown a propensity for generating correct, multi-line programs from natural language prompts. Given past findings highlighting that bugs and patches can be distinguished by predictability according to simple language models, it is natural to ask if modern, large neural options lend themselves especially well to program repair without any calibration. We study this in the context of one-line bugs, providing a series of models of varying scales (from 160M to 12B parameters) with the context preceding a buggy line in 72 Java and Python programs and analyze the rank at which the correct patch (and original buggy line) is generated, if at all. Our results highlight a noticeable correlation of model size with test-passing accuracy and patch ranking quality, as well as several other findings related to the differences between the two languages and the propensity for especially the largest models to generate candidate patches that closely resemble (if not exactly match), the original developer patch.
more » « less
Full Text Available
On the Naturalness of Fuzzer-Generated Code

https://doi.org/10.1145/3524842.3527972

Kambhamettu, Rajeswari Hita; Billos, John; Oluwaseun-Apo, Tomi; Gafford, Benjamin; Padhye, Rohan; Hellendoorn, Vincent J. (May 2022, 19th International Conference on Mining Software Repositories)

Compiler fuzzing tools such as Csmith have uncovered many bugs in compilers by randomly sampling programs from a generative model. The success of these tools is often attributed to their ability to generate unexpected corner case inputs that developers tend to overlook during manual testing. At the same time, their chaotic nature makes fuzzer-generated test cases notoriously hard to interpret, which has lead to the creation of input simplification tools such as C-Reduce (for C compiler bugs). In until now unrelated work, researchers have also shown that human-written software tends to be rather repetitive and predictable to language models. Studies show that developers deliberately write more predictable code, whereas code with bugs is relatively unpredictable. In this study, we ask the natural questions of whether this high predictability property of code also, and perhaps counter-intuitively, applies to fuzzer-generated code. That is, we investigate whether fuzzer-generated compiler inputs are deemed unpredictable by a language model built on human-written code and surprisingly conclude that it is not. To the contrary, Csmith fuzzer-generated programs are more predictable on a per-token basis than human-written C programs. Furthermore, bug-triggering tended to be more predictable still than random inputs, and the C-Reduce minimization tool did not substantially increase this predictability. Rather, we find that bug-triggering inputs are unpredictable relative to Csmith's own generative model. This is encouraging; our results suggest promising research directions on incorporating predictability metrics in the fuzzing and reduction tools themselves.
more » « less
Full Text Available
Learning lenient parsing & typing via indirect supervision

https://doi.org/10.1007/s10664-021-09942-y

Ahmed, Toufique; Devanbu, Premkumar; Hellendoorn, Vincent J (March 2021, Empirical Software Engineering)
null (Ed.)
Full Text Available

Search for: All records